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A FRAMEWORK 

FOR APPLYING SUBGRADIENT METHODS 
TO CONIC OPTIMIZATION PROBLEMS 
(version 2)* * 

JAMES RENEGAR 


Abstract. A framework is presented whereby a general convex conic opti¬ 
mization problem is transformed into an equivalent convex optimization prob¬ 
lem whose only constraints are linear equations and whose objective function 
is Lipschitz continuous. Virtually any subgradient method can be applied to 
solve the equivalent problem. Two methods are analyzed. 


1. Introduction 

Given a conic optimization problem for which a strictly feasible point is known, 
we provide a transformation to an equivalent convex optimization problem which 
is of the same dimension, has only linear equations as constraints (one more linear 
equation than the original problem), and has Lipschitz-continuous objective func¬ 
tion defined on the whole space. Virtually any subgradient method can be applied 
to solve the equivalent problem, the cost per iteration dominated by computation 
of a subgradient and its orthogonal projection onto a subspace (the same subspace 
at every iteration, a situation for which preprocessing is effective). 

We develop representative complexity results for two methods, one of which is 
executable under an ideal circumstance (knowing the optimal value), but the other 
of which is general (requiring only that a strictly feasible point be known). 

Perhaps most surprising is that the transformation to an equivalent problem is 
simple and so is the basic theory, and yet the approach has been overlooked until 
now, a blind spot. 

The following section presents the transformation and basic theory. Represen¬ 
tative algorithmic implications are developed in Section 3. A general example is 
presented in Section 4, highlighting key differences with traditional literature on 
subgradient methods. 

This paper significantly extends subgradient-method results first reported in 
[3] (as well as results in the previously posted version of the present paper). A 
companion paper will encompass the accelerated gradient-method results reported 
in [3]. The general theory presented in the following section is the foundation for 
each paper. 


2. Basic Theory 

The theory given here is elementary and yet has been overlooked, the “blind 
spot” referred to above. 

Special thanks to Yurii Nesterov for encouragement at an early, critical stage of the research. 
And thanks to Rob Freund, whose correspondence sparked the realization of how to best partition 
the results into two papers. 

* The development of algorithms has been streamlined and considerably strengthened. 
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Let £ be a finite-dimensional real Euclidean space. Let 1C C £ be a proper, 
closed, convex cone with non-empty interior. 

Fix a vector e £ int(/C) (interior). We refer to e as the “distinguished direction.” 
For each x £ £, let 

A m in(a:) := inf{A : x — A e ^ /C} , 

that is, the scalar A for which x — Xe lies in the boundary of 1C. (Existence and 
uniqueness of A m j n (a;) follows from e £ int(/C) £ and convexity of 1C.) 

If, for example, £ = S ra (n x n symmetric matrices), 1C = S" (cone of positive 
semidefinite matrices), and e = / (the identity), then A m i n (A) is the minimum 
eigenvalue of X. 

On the other hand, if 1C = R" (non-negative orthant) and e is a vector with all 
positive coordinates, then A m i n (a;) = min jXj/ej for x £ R n . Clearly, the value of 
Amin (*e) depends on the distinguished direction e (a fact the reader should keep in 
mind since the notation does not reflect the dependence). 

Obviously, 1C = {x : A m i n (x) > 0} and int(/C) = {x : A m i n (a ; ) > 0}. Also, 

A m in(sx + te) = s A m in(ic) +1 for all x £ £ and scalars s > 0, t . (2.1) 

Let 

B := {v £ £ : e + v, e — v £ 1C} , 

a closed, centrally-symmetric, convex set with nonempty interior. Define a semi- 
norm 1 on £ according to 

IMloo := minjf : u = tv for some v £ B} . 

Let Boa ( x , r) denote the closed ball centered at x and of radius r. Clearly, Boo (0,1) = 
B, and Boo(e, 1) is the largest subset of 1C that has symmetry point e, i.e., for each 
v, either both points e + v and e — v are in the set, or neither point is in the set. 

It is straightforward to show || Hoc is a norm if and only if 1C is pointed (i.e., 
contains no subspace other than {0}). 

Proposition 2.1. The function x ha A m in(£) is concave and Lipschitz continuous: 

I Amin (x) - A min (y) | < 11 a: - y||oo for all x,y £ £ . 

Proof: Concavity follows easily from the convexity of A, so we focus on establishing 
Lipschitz continuity. 

Let x,y £ £. According to (2.1), the difference A m in(£ + te) — A m i n (y + te) 
is independent of t , and of course so is the quantity ||(a; + te) — (y + fe)||oo . 
Consequently, in proving the Lipschitz continuity, we may assume x lies in the 
boundary of 1C, that is, we may assume A m i n (a;) = 0. The goal, then, is to prove 

|Amin(i£+ v)I < IMloo for all v £ £ . (2.2) 

We consider two cases. First assume x + v does not lie in the interior of 1C, 
that is, assume A m in(a ; + v) < 0. Then, to establish (2.2), it suffices to show 
A m in( 2 : + v) > —Halloo , that is, to show 

x + v -(- |H|oo e £ 1C . (2.3) 


1 Recall that a seminorm || || satisfies ||tu|| = |t| [|t)|| and |u + ?;| < \\u\\ -j- |j;11. but unlike a 
norm, is allowed to satisfy ||n|| = 0 for i; / 0. 
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However, 

v + IMloo e € HoodMIoo e, H^Hoo) C 1C , (2.4) 

the set containment due to 1C being a cone and, by construction, B^e, 1) C 1C. 
Since x £ 1C (indeed, x is in the boundary of 1C), (2.3) follows. 

Now consider the case x + v £ 1C, i.e., A m i n (a; + v) >0. To establish (2.2), it 

suffices to show A m i n (ai + v) < |M|oo , that is, to show 

x + v — IMU e ^ int(/C) . 

Assume otherwise, that is, assume 

x = w + Moo e ~ v f° r some w £ int(/C) . 

Since |MI<x> e — v £ 1C (by the set containment on the right of (2.4)), it then follows 
that x £ int(/C), a contradiction to x lying in the boundary of 1C. □ 

Assume the Euclidean space £ is endowed with inner product written u ■ v. Let 
Affine C £ be an affine space, i.e., the translate of a subspace. For fixed c £ £, 
consider the conic program 

inf c • x 1 

s.t. x £ Affine > CP 

x £ 1C J 

Let z* denote the optimal value. 

Assume c is not orthogonal to the subspace of which Affine is a translate, since 
otherwise all feasible points are optimal. This assumption implies that all optimal 
solutions for CP he in the boundary of 1C. 

Assume Affine D int(ZC) - the set of strictly feasible points - is nonempty. Fix a 
strictly feasible point, e. The point e serves as the distinguished direction. 

For scalars z £ R, we introduce the affine space 

Affine z := {x £ Affine : c • x = z} . 

Presently we show that for any choice of 2 satisfying z < c - e , CP can be easily 
transformed into an equivalent optimization problem in which the only constraint 
is x £ Affine 2 . We need a simple observation. 

Lemma 2.2. Assume CP has bounded optimal value. 

If x £ Affine satisfies c ■ x < c ■ e, then A m i n (:r) < 1 . 

Proof: If A m i n (a:) > 1, then e + t(x — e) is feasible for all t > 0 (using (2.1)). As 
the function t K»• c • (e + t(x — e)) is strictly decreasing (because c • x < c ■ e), this 
implies CP has unbounded optimal value, contrary to assumption. □ 

For x £ £ satisfying A m i n (a;) < 1, let n(x) denote the point where the half-line 
beginning at e in direction x — e intersects the boundary of 1C: 

~ e + i-W*) ~ e ) 

(to verify correctness of the expression, observe (2.1) implies A m i n (7r(a;)) = 0). We 
refer to tt(x) as “the projection (from e) of x to the boundary of the feasible region.” 

The centrality of the following result to the development makes the result be a 
theorem even if the proof is straightforward. 
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Theorem 2.3. Let z be any value satisfying z < c- e . If x* solves 

sup A m i„ (a;) (2 

s.t. x e Affine z , ' * ' 

then 7r(x*) is optimal for CP. Conversely, if 7r* is optimal for CP, then x* := 
e + f fZz* ( 7r * — e ) optimal for (2.5), and n* = n(x*). 

Proof: Fix a value satisfying z < c ■ e. It is easily proven from the convexity of K. 
that x 1 — y n(x) gives a one-to-one map from Affine z onto 

{ 7 r £ Affine f~l bdy(/C) : c ■ n < c ■ e} , (2.6) 

where bdy(/C) denotes the boundary of 1C. 

For x £ Affine z , the CP objective value of n(x) is 

c • tt(x) = c • (e + 1 _ A ^ n(a!) (x - e)) 

= C - e + l-Al (l ) (^ C ' e )- ( 2J ) 

a strictly-decreasing function of A m i n (a;). Since the map x 1 —> n(x) is a bijection 
between Affine 2 and the set (2.6), the theorem readily follows. □ 

CP has been transformed into an equivalent linearly-constrained maximization 
problem with concave, Lipschitz-continuous objective function. Virtually any sub¬ 
gradient method - rather, supgradient method - can be applied to this problem, 
the main cost per iteration being in computing a supgradient and projecting it onto 
the subspace C of which the affine space Afhne z is a translate. 

For illustration, we digress to interpret the implications of the development thus 
far for the linear program 

min^R- c T x ) 

s.t. Ax = b > LP 

x > 0 J 

assuming e = 1 (the vector of all ones), in which case A m i„(a;) = mim, Xj , and || Hoc 
is the loo norm, i.e., ||i’|| 0 o = nrax^ |uj|. Let the number of rows of A be m > 1. 

For any scalar z < c T 1, Theorem 2.3 asserts that LP is equivalent to 

maxj, mirij Xj 

s.t. Ax = b (2.8) 

T 

C X = z , 

in that when x is feasible for (2.8), x is optimal if and only if the projection 
n(x) = 1 + 1 _ mi 1 n - (x — 1) is optimal for LP. The setup is shown schematically in 
the following figure: 
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Proposition 2.1 asserts that, as is obviously true, x <—> min jXj is foo-Lipschitz 
continuous with constant 1. Consequently, the function also is || || 2 -Lipschitz con¬ 
tinuous with constant 1, as is relevant if supgradient methods rely on the standard 
inner product in computing supgradients and their orthogonal projections onto the 
subspace £ of which Affine- is a translate, i.e., £ = {v : Av = 0 and c T v = 0} . 

With respect to the standard inner product, the supgradients of x K > min,- Xj at 
x are the convex combinations of the standard basis vectors e{k ) for which Xk = 
min j Xj. Consequently, the projected supgradients at x are the convex combinations 
of the vectors P[- for which Xk = mirij Xj, where i\. is the k th column of the matrix 
projecting R” onto the nullspace of A = ], that is 

P — I - A T (AA T )~ l A . 

If m <C n, then P is not computed in its entirety, but instead the matrix M = 
(AA T ) _1 if formed as a preprocessing step, at cost 0(m 2 n). Then, for any iterate x 
and an index k satisfying Xk = min, Xj, the projected supgradient Pk is computed 
according to 

u — MAk -A- v = A T u Pk = e(k ) — v , 

for a cost of 0(m 2 + #non_zero_entries_in_A) per iteration. 

Before returning to the general theory, we note that if the choices are £ = S™, 
JC = 8™ and e = / (and thus A m j n (AC) is the minimum eigenvalue of A^), then with 
respect to the trace inner product, the supgradients at X for the function X i—> 
Amin (,-X ) aie the convex combinations of the matrices w , wl iere A v — A,,,j,j (X)v 
and |jv|| 2 = 1. 

Assume, henceforth, that CP has at least one optimal solution, and that z is a 
fixed scalar satisfying z < c ■ e. Then the equivalent problem (2.5) has at least 
one optimal solution. Let x* z denote any of the optimal solutions for the equivalent 
problem, and recall z* denotes the optimal value of CP. A useful characterization 
of the optimal value for the equivalent problem is easily provided. 


Lemma 2.4. 


Amin (^*) = 


Z — 2 


Proof: By Theorem 2.3, tt(x*) is optimal for CP in particular, c • tt(x*) 
Thus, according to (2.7), 

= c ' e + i-W»;) (* - c ' e ) • 

Rearrangement completes the proof. 


z 


* 


□ 


We focus on the goal of computing a point 7r which is feasible for CP and has 
better objective value than e in that 


C • 71 — z 


— < e 

y * 


(2.9) 

c • e — z r 

where 0 < e < 1. Thus, for the problem of primary interest, CP, the focus is on 
relative improvement in the objective value. 


The following proposition provides a useful characterization of the accuracy 
needed in approximately solving the CP-equivalent problem (2.5) so as to ensure 
that for the computed point x, the projection 7r = tt(x) satisfies (2.9). 
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Proposition 2.5. If x £ Affine 2 and 0 < e < 1, then 
c ■ 7r(x) — z* 


c • e- z* 


< e 


if and only if 


Anil) (^z) A m in(*£) A 


e c - e — Z 


1 - e c ' e — z* 


Proof: Assume x £ Affine^ For y = x and y = x* , we have the equality (2.7), 
that is, 

c ■ n(y) = c • e + 1 _ A ^ in(y) (2 — c • e) . 


Thus, 


Hence, 


c • tt(x) — z* c ■ w(x) — c ■ tt(x*) 


c • e — c • 7r(x*) 
l l 


i 1 A m jnUt) 


1 AminUz) 

_ An in (•£ z') A m in(*r) 

1 Amin(*£) 

C • 7r(x) — 2* 


< e 


c • e — 2 ’ 


Amin^-) A m in(*E) A(l A m i n (x)) 


<S=> 


(1 - e)(A min (x*) - A min (a;)) < e(l - A min (x*)) 


<*=> 


Amin (a;*) - Amin (a;) < y=i(l - Amin (a:*)) • 

Using Lemma 2.4 to substitute for the rightmost occurrence of A m in(a:*) completes 
the proof. □ 


In concluding the section, we remark that the basic theory holds for convex conic 
optimization problems generally. For example, consider a conic program 

min xg £ c • x 1 

s.t. x £ Affine > CP' (2-10) 

Ax + be/C J 

Here, A is a linear operator from £ to a Euclidean space £', b £ £' and A' is a 
proper, closed, convex cone in £' with nonempty interior. 

For a problem with multiple conic constraints, simply let 1C be the Cartesian 
product of the cones. 

Obviously, the optimization problem CP corresponds to the case that A is the 
identity, b = 0 and 1C = 1C. (Thus, on the surface, CP' appears to be more general 
than CP.) 

Fix a feasible point e for which e' := Ae + b £ int(A'). Using e' as the distin¬ 
guished direction results in a function x’ H > A( nin (x') and a seminorm || on £'. 
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A seminorm is induced on £, according to im-> ||, for which the closed unit 
ball centered at e is the largest subset of {x : Ax + b G 1C'} with symmetry point e. 
The map x fa \' min {Ax + b) is Lipschitz continuous: 

|A' min (A£ + b ) ~ ^min ( A V + b ) \ < || A{x - y)]]^ for all x,y G £ . 

If x G Affine satisfies c ■ x < c ■ e , then A^ in (Aa: + 5) < 1, and the projection of 
x (from e) to the boundary of the feasible region is given by 

< x ) = e + i-AYJAx+b) (* “ e ) ‘ 

Assuming 2 is a scalar satisfying z < c ■ e, the problem 

max \' min (Ax + b) , 2 

s.t. x G Affine z (:= {x G Affine : c • x = z}) 


is equivalent to CP / in that when x £ Affine z , x is optimal for (2.11) if and only if 
7r(a;) is optimal for CPb Moreover, letting x* denote any optimal solution of (2.10), 
there holds the relation for all x G Affine z and 0 < e < 1, 


c • x — z 

- < e 

c- e- z* ' 




Amin (Ax* z + b) - X'^^Ax + b) < - 


c - e — z 

I" ’ 

c • e — z* 


where z* is the optimal value of CP / . 

These claims regarding CP / are justified with proofs that are essentially identical 
to the proofs for CP. Alternatively, they can be deduced from the results for CP 
by introducing a new variable t into CP / and an equation t = 1, then replacing 1C' 
by K, := {(x,t) : Ax + tb G K.'}, thereby recasting CP' to be of the same form as 
CP. (Only on the surface does CP / appear to be more general than CP.) 

We focus on CP because notationally its form is least cumbersome. For every 
result derived in the following sections, an essentially identical result holds for any 
conic program, even identical in the specific constants. 


3. Applying Supgradient Methods 


In this section we show how the basic theory from Section 2 leads to complexity 
results regarding the solution of the conic program 

min c ■ x 1 

s.t. x G Affine > CP 
x G A J 

Continue to assume CP has an optimal solution, denote the optimal value by z *, 
and let e be a strictly feasible point, the distinguished direction. 

Given e > 0 and a value satisfying z < c- e, the approach is to apply supgradient 
methods to approximately solve 

max A min (x) , . 

s.t. Affine z , ' ’ 

where by “approximately solve” we mean that x G Affine z is computed for which 


Amin(*£ z ) A m i n (x) GJ 


c- e — Z 


1 — e c • e — z* 


Indeed, according to Proposition 2.5, the projection i r = it(x) will then satisfy 


C • 7T — Z 

c-e- z* 


< e . 
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Recall that C is the subspace of which the the affine space Affine, is a translate. 
When supgradient methods are applied to solving (3.1), C is the subpace onto which 
supgradients are orthogonally projected. 

Supgradients and their orthogonal projections depend on the inner product. 2 
We allow the “computational inner product” to differ from the one relied upon in 
expressing CP, the inner product written u ■ v. We denote the computational inner 
product by ( , ), and let || || be its norm. 

It is an instructive exercise to show that the supdifferential (set of supgradients) 
at x for the function x K > A m i n (z) is 

{:g ■ ( g , e) = 1 and ( g,y - (x - A min (:r) e)) > 0 for all y £ K} , 

that is, the supdifferential consists of vectors g such that (g, e) = 1 and —g is in 
the normal cone to K at x — A m i n (x) e. (To begin, note it may be assumed that 
A m in(a:) = 0, due to (2.1).) 

For let 

M z := sup { |Amin( |g:^|| linfa)l : x, y € Affine, and x ^ y} , 

the Lipschitz constant for the map x K > A m ; n (a;) restricted to Affine,. Proposi¬ 
tion 2.1 implies M, is well-defined (finite), although unlike the Lipschitz constant 
for the norm appearing there (i.e., || ||oo), M, might exceed 1, depending on || ||. 

We claim the values M, are identical for all z. To see why, consider that for 
Z\ < Z2 < c ■ e , a bijection from Affine, x onto Affine, 2 is provided by the map 

x i-A y(x) :=x+ - x) . 

Observe, using (2.1), 

Amin(y(a;)) = Ifzff A m in( a: ) + c-e-z\ ’ 

and thus 

Amir] (y (x )) A m i n (j/(x)) c-e—zi ('^min(*^') A m i n (ir)) for X, X € Affine, x . 

Since, additionally, ||j/(a:) — y(x )|| = §fE^ll x — 5||, it is immediate that the values 
M z are identical for all z < c ■ e. A simple continuity argument then implies this 
value is equal to M c . e . Analogous reasoning shows M, = M c . e for all z > c ■ e. In 
all, M, is independent of 2 , as claimed. 

Let M denote the common value, i.e., M = M z for all z. 

The following proposition can be useful in modeling and in choosing the com¬ 
putational inner product (the inner product for whose norm M is the Lipschitz 
constant). 

Let B(e,r) := {x : ||rr — e\\ < r}. 

Proposition 3.1. M < 1/f , where r := max{;' : B(e, r ) D Affine c . e C 1C} 


2 Of course orthogonality in a Euclidean space S depends on the chosen inner product ((u. v) = 
0), and hence so do orthogonal projections. Recall that supgradients also depend on the chosen 
inner product, in that for a concave function / : £ —> R, the supgradients of / at x are the vectors 
V/fo) S £ satisfying f(x) + (V/(a;),n) > f(x + v) for all v S £■ 
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Proof: According to Proposition 2.1, 

|^mink) ^min(?/) | ~ |k 2/11oo fol" V • 

Consequently, it suffices to show \\x — y|| < f\\x — y||oo for all x,y £ Affine c . e , i.e., 
it suffices to show for all v £ C that ||u|| < f^lloo- 

However, according to the discussion just prior to Proposition 2.1, B^e, 1) 
is the largest set which both is contained in K. and has symmetry point e, from 
which follows that -Boo(c 1) (~l Affine c . e is the largest set which is both contained in 
K, H Affine c . e and has symmetry point e. Hence 

B(e, f) fl Affine c . e C B oc (e, 1) n Affine c . e , 
implying ||u|| < fj|u||oo for all v £ C. □ 

In passing we remark that in the context of CP / - the conic program (2.10) - 
the role of r in the above proposition is played by 

f = max {r : B(e, r ) H Affine c . e C {x : Ax + b £ 1C 1 }} . 

Then, |A m j n (Ax + b) — A m i n (Ay + b)\ < (l/f)||x — y\\ for all x,y £ Affine, and zeR. 

Towards considering specific supgradient methods, we recall the following stan¬ 
dard and elementary result, rephrased for our setting: 

Lemma 3.2. Assume z £ R and x,y £ Affine^ . Let g be the projection of a 
supgradient VA m j n (x) onto C (the subspace of which Affine z is a translate). 

For all scalars a, 

||(x + ag) - y\\ 2 < \\x - y\\ 2 - 2a (A min (y) - A min (x)) + a 2 ||sr|| 2 . (3.2) 

Proof: Simply observe 

||(x + ag) - y\\ 2 = \\x - y\\ 2 + 2a(g,x- y) + « 2 ||^|| 2 

= Ik - yll 2 '2a(VA min (x), y - x) + a 2 || 3|| 2 (by x - y £ C) 

< Ik - y|| 2 - 2a (A min (y) - A min (x)) + a 2 ||g|| 2 , 
the inequality due to concavity of the map x i —> A m ; n (x) . □ 

We present and analyze two algorithms. We begin by considering the ideal case 
in which 2 *, the optimal value for CP, is known. The main result for the algorithm 
here provides a benchmark to which to compare our result for the general algorithm 
developed subsequently. 

Knowing z* is not an entirely implausible situation. For example, if strict feasi¬ 
bility holds for a primal conic program and for its dual 

min c T x max b T y 

s.t. Ax = b s.t. A T y + s = c 

x £ 1C s £ 1C* , 

then the combined primal-dual conic program is known to have optimal value equal 
to zero: 

min c T x — b T y 
s.t. Ax = b 

A T y + s = c 
(x,s) £/Cx/C* . 
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Algorithm 1: 

(0) Input: z*, the optimal value of CP, 

e, a strictly feasible point for CP, and 
x £ Affine satisfying c • x < c ■ e . 

Initialize: Let x 0 = e + °’ e ~~ g (x — e ) (thus, c ■ x 0 = z*), 
and let 7r 0 = 7r(:ro) (= 7r(x)). 

(1) Iterate: x k+1 = x k - g k , 

where g k is the projection of a supgradient VA m i n (aife) onto C. 
Let 7Tfc + i = 7r(xfc+i) (which is feasible for CP) 


All of the iterates x k lie in Affine z *, and hence, A m i n (a;fc) < 0, with equality if 
and only if x k is feasible (and optimal) for CP. 

For all scalars z < c ■ e and for x £ Affine z , define 

dist z (a") := min{||x — x*|| : x* z is optimal for (3.1)} . 

Proposition 3.3. The iterates for Algorithm 1 satisfy 

max{A m i n (aife) : k = + in} > —M dist z * (. X()/\/m + 1 . 

Proof: Lemma 3.2 implies 
dist 2 * (xfc+i ) 2 

^ dist 2 * 2 ( A m i n (xfc)/||p , / e || ) (0 A m i n (#/c)) “h (A m i n (xfc)/||^ , fc||) 

= dist 2 * (#&) (A m in(^fc)/|| 5 , fc||) 5 

and thus by induction (and using ||^|| < M), 

i-\-m 

dist z » (xz+m+i) 2 < dist z * {xf) 2 - E (Amin (x k )/M) 2 

k=e 

< dist z * {xe) 2 - min{A min (a : fc ) 2 : k = +m} , 

implying the proposition (keeping in mind A m i n (xfe) < 0 ). □ 

We briefly digress to consider the case of 1C being polyhedral, where already 
an interesting result is easily proven. The following corollary is offered only as a 
curiosity, as the constant C 2 typically is so large as to render the lower bound on i 
meaningless except for minuscule e. 

Corollary 3.4. Assume 1C is polyhedral. There exist constants C\ and C 2 (depen¬ 
dent on CP, e, x and the computational inner product), such that for all 0 < e < 1, 

i > Ci + C 2 log(l/e) => min ° Wk -— < e. 

For first-order methods, such a logarithmic bound in e was initially established by 
Gilpin, Pena and Sandholm [?]. They did not assume an initial feasible point e was 
known, but neither did they require the computed solution to be feasible (instead, 
constraint residuals were required to be small). They relied on an accelerated 
gradient method, along with the smoothing technique of Nesterov [?]. As is the 
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case for the above result, they assumed the optimal value of CP to be known apriori, 
and they restricted K, to be polyhedral. 

The proof of the corollary depends on the following simple lemma. 

Lemma 3.5. For Algorithm 1, the iterates satisfy 

c * tt/c z _ A m in(3i/c) 

c-e- z* 1 - A min (x fc ) ' 

Proof: Immediate from = e + — . 1 , — Ax k — e) and c • x k = z* . □ 

Proof of Corollary 3.4: With K, being polyhedral, the concave function x K»• 
AminC^) is piecewise linear, and thus there exists a positive constant C such that 

dist*. (x) < — C A m in (a;) for all x £ Affine** . 

Then Proposition 3.3 gives 

max{A min (x fc ) : k = £,...,£ +m} > CM A min (xi)/v / m + 1 , 
from which follows 

max{ A min (x k ):k = £,...,£+\(2CM) 2 ]} 2 Amin (-El?) ? 

i.e., A m in(x^) is “halved” within [(2 CM) 2 ] iterations. The proof is easily completed 
using Lemma 3.5. □ 

We now return to considering general convex cones JC. 

The iteration bound provided by Proposition 3.3 bears little obvious connection 
to the geometry of the conic program CP, except in that the constant M is related 
to the geometry by Proposition 3.1. The other constant - dist*.(xfc) - does not 
at present have such a clear geometrical connection to CP. We next observe a 
meaningful connection. 

The level sets for CP are the sets 

Level* = Affine 2 n K, , 

that is, the largest feasible sets for CP on which the objective function is constant 3 . 
If z < z* , then Level, = 0 . 

If some level set is unbounded, then either CP has unbounded optimal value or 
can be made to have unbounded value with an arbitrarily small perturbation of c. 
Thus, in developing numerical optimization methods, it is natural to focus on the 
case that level sets for CP are bounded. 

For scalars 2 , let 

diam* := sup{||x — y\\ : x,y £ Level*} , 
the diameter of Level*. (If Level* = 0, let diam* := —oo.) 


3 There is possibility of confusion here, as in the optimization literature, the terminology “level 
set” is often used for the portion of the feasible region on which the (convex) objective function 
does not exceed a specified value rather than - as for us - exactly equals the value. Our choice 
of terminology is consistent with the general mathematical literature, where the region on which 
a function does not exceed a specified value is referred to as a sublevel set, not a level set. 
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Lemma 3.6. Assume x £ Affine z », and let it = tt(x). Then 


dist z * (x) (1 A m j n (x)) distc-TT ( 71 ) 


dist c . w (7r) diamc.-n- 

^ C-TT — Z* — _ C-7T — Z* ’ 

c-e—z* c-e—z* 


Proof: Since 

n = e+ (a? ~ e) , (3.3) 

Theorem 2.3 implies that the maximizers of the map y 1 —> A m i n (y) over Affine c . w 
are precisely the points of the form 

x c ir = e + 1 —A min (x) ( X z* ~ e ) ’ 

where x*» is a maximizer of the map when restricted to Afhne z » (i.e., is an optimal 
solution of CP). Observing 

l-\L(*) (x - X* z *) , 

it follows that 

dist c . 7r (7r) = 1 _ Ai ^ in( , ) dist,.(a:) , 

establishing the first equality in the statement of the lemma. The second equality 
then follows easily from (3.3) and c • x = z*. The inequality is due simply to 
7 t,x*. v £ Leveies, for all optimal solutions x*. n of the CP-equivalent problem (3.1) 
(with z = c ■ 7r). □ 


For scalars z, define 


Diam z := max{diam z / : z' < z} , 

the “horizontal diameter” of the sublevel set consisting of points x that are feasible 
for CP and satisfy c-x < z. For z* < z < c-e, the value Diam z can be thought of as 
a kind of condition number for CP, because Diam z being large is an indication that 
the optimal value for CP is relatively sensitive to perturbations in the objective 
vector c. (Related quantities have played the role of condition number in the 
complexity theory of interior-point methods - c.f., [?].) 

For z* < z < c ■ e, define 

Dist z := sup{diam z /(x) : z' < z and x £ Levels} . 

Clearly, there holds the relation 


Dist z < Diam z , 

and hence if the “condition number” Diam z is only of modest size, so is the value 
Dist z . 

Following is our main result for Algorithm 1. By substituting Diamc.-n.,, for 
Distc.^g, and 1/r for M (where r is as in Proposition 3.1), the statement of the 
theorem becomes phrased in terms clearly reflecting the geometry of CP. 

Theorem 3.7. Assume 0 < e < C , where 7 r 0 = 7r(xo) is the initial CP-feasible 
point for Algorithm 1 (i.e., assume 7 r 0 does not itself satisfy the desired accuracy). 
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Then 


£ > (2MDist c . 7ro )‘ 


4 /1 — e 


1 - e 


+ log 2 


C-7TQ —2: 

c-e—2* 


log 2 


1 - e 


1 _ C-TTp — Z* 

, c-e—2* 


C-TT k - Z 

mm - < e . 

fe<^ c - e — z* 


Proof: To ease notation, let X k := A m i n (a:fc) . 

Let ko = 0 and recursively define fcj+i to be the first index for which Afc i+1 < 
X ki /2 (keeping in mind X k < 0 for all k). Proposition 3.3 implies 


ki +1 — ki + 1 < 


< 


< 


/ 2 M dist 2 » (x ki ) 

V 


^ 2M distc.^. (7T fe J 


1 ~ A fc , \ 2 
X ki ) 


^2MDist c . Wo 


1 - A ki 
Afc; 


2 


(by Lemma 3.6) 


(3.4) 


where the final inequality is due to c • Tr k . (i = 0,1,...) being a decreasing sequence 
(using Lemma 3.5). 

Let i' be the first sub-index for which X k ., > —e/(l — e). Lemma 3.5 implies 


C • 7Tfc - 2 

- 1 - < e 

c • e — z* 


Thus, to prove the theorem, it suffices to show £ = ki> satisfies the inequality in the 
statement of the theorem. 

Note %' > 0 (because, by assumption, e < )■ Observe, then, 


•' <1 + los ’(zwz7j) 

= l + log 2 (3=i) + lo g2 ( r - 1 Tl ? ) (3.5) 


(again using Lemma 3.5). 
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Additionally, 


i'-l 

hi' = ^ hi-\. i hi 

i =o 


< (2MDist c . 7ro ) 2 Y, 


l ' 1 '1-Afc ' 2 


i=0 
i'-l 


Afc, 


1 — 2*Afc., 


(by (3.4) 


< (2MD1SWJ 2 ^ 2 . A ^ 


i=0 

i'-l 


— (2MDist c . 7ro ) 2 ^ ( 1 + -j 


i=0 


< (2M Distc.^u) I i +4-b 


2* e 


1 — e 4 /1 — e 


3 V e 


Using (3.5) to substitute for i' completes the proof. □ 

For our second algorithm, we discard the requirement of knowing z *, the optimal 
value for CP. Now we require that e (the desired relative-accuracy) be input. 


Algorithm 2: 

(0) Input: 0 < e < 1 , 

e, a strictly feasible point for CP, and 
x £ Affine satisfying c ■ x < c ■ e. 

Initialize: xq = ttq = tt(x) 

(1) Iterate: x k+1 := x k + 

where g k is the projection of a supgradient VA m ; n (a;fc) onto C. 
Let 7T k ^-l .— 7r(x^-{-l) . 

If C • (e - TT k+1 ) > | c • (e - ifc+i), then let x k+1 = 7T fc+ i; 

else, let x k+1 = x k+1 . 


Unsurprisingly, the iteration bound we obtain for Algorithm 2 is worse than 
our result for Algorithm 1, but perhaps surprisingly, the bound is not excessively 
worse, in that the factor for 1/e 2 is essentially unchanged (it’s the factor for 1/e 
that increases, although typically not by an excessive amount). 


Theorem 3.8. Assume 0 < e < ■ For the iterates of Algorithm 2, 

£>8 (MDist c .,J 2 ^] 2 + l f log 4/3 ^ j + ^ 


. C • TT k —Z 

mm - < e . 

k<i c- e — z* 
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Proof: In order to distinguish the iterates obtained by projecting to the boundary, 
we record a notationally-embellished rendition of the algorithm which introduces a 
distinction between “inner iterations” and “outer iterations”: 


Algorithm 2 (notationally-embellished, version): 

(0) Input: 0 < e < 1 , e and x. 

Initialize: yyo = n(x) , 

i = 1 (outer iteration counter), 
j = 0 (inner iteration counter). 

(1) Compute i/ij , ] = y itj + 9iJ ' 

where g l ,, J is the projection of a supgradient VA m i n(yi,j) onto C. 


(2) If c • (e - 7r(y ij+ i)) > § c • (e - y i>j+ i) , 

then let y i+ i,o = ^{yi,j+i), * + 1 
else, let j + 1 —> j ■ 

(3) Go to step 1. 


i and 0 —>- j ; 


For each outer iteration i , all of the iterates yij have the same objective value. 
Denote the value by z%. Obviously, z\ is equal to the value c ■ 7Tq appearing in the 
statement of the theorem. Let 


Dist := Distc-^o = Dist Zl . 


Step 2 ensures 


c-e- z i+ 1 > | (c-e - Zi) . 


(3.6) 


Thus, zi,Z 2 ,--- is a strictly decreasing sequence. Consequently, as y t g £ Level Zi , 
we have dist Zi (?/j i o) < Dist for all i. 

From (3.6) we find for scalars 5 > 0 that 


c-e - z i+ i 

- < o 

c • e — z* 

and thus, for e < 1, 


i < log 4/3 


6 

c-e—z i 
c-e— z* 



Zj~ Z 
c-e — z* 


i< 1 + log 4/3 


1 - e 

Z\ —Z* 

c-e—z* 



(3.7) 


Hence, if an outer iteration i fails to satisfy the inequality on the right, the initial 
inner iterate y t g fulfills the goal of finding a CP-feasible point n satisfying c c ))Z z z t < e 
(i.e., the algorithm has been successful no later than the start of outer iteration 
i). Also observe that (3.7) provides (letting e f 0) an upper bound on /, the total 
number of outer iterations: 


I <1 + log 4/3 



(3.8) 


For i = 1let Ji denote the number of inner iterates computed during 
outer iteration i. that is, Ji is the largest value j for which j is computed. Clearly, 
Jj = oo, whereas Ji,..., J/_i are finite. 
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To ease notation, let A tj := A m ; n (j/ij), and let A* := A m i n (x* ), the optimal 
value of 

max A min (a;) 
s.t. x G Affine,. . 


According to Lemma 2.4, 


A* = 


z* - ^ 


(3.9) 


c • e — z” 

It is thus valid, for example, to substitue A* for , in (3.7). Additionally, (3.9) 
implies (3.6) to be equivalent to 


l-A* +1 >f(l-A*). 

For any point y, we have n(y) = e + 1A 1 ^ (y — e), and thus, 
c • e — c • 7T (y) 1 


c-e- c-y 


1 - A min {y) ' 


Hence, 


c • e — c • 7r(y) 4 „ . . , ,, 

-— > x A min (y) > 1/4 . 

c-e-c-y 3 


Consequently, 

A ij < 1/4 for j < Ji . 

We use the following relation implied by Lemma 3.2: 

dist 2i (yi J+ i) < dist Zi {yij) — ||g«j§f ~ + ( 2 || 9iii || 

We begin bounding the number of inner iterations by showing 

8 (MDist) 2 


A* > max{ A, e} => < 


Indeed, for j < Ji , 


1 ,2 


1 A 


- e (A* - Ajj) + j e 

< -e(max{i,e} - j) + 

= min{A(e 2 - e), \e - fe 2 } 

<!He 2 -e) + i(^-!e 2 ) 


(using (3.11) 


(3.10) 


(3.11) 

(3.12) 

(3.13) 


Thus, according to (3.12), for j < Ji , 


inductively giving 


dist 2i (y ij+ i) 2 < dist,.(y, ;j ) 2 - 


8 AL 2 ’ 


dist Zi (y i}j+1 y < dist 2i (y ii0 ) - 


2 U + !) e 


8 M 2 


< Dist 2 - (i + 1)e 
8M 2 


The implication (3.13) immediately follows. 

The theorem is now readily established in the case e > 1/2. Indeed, because 
of the identity (3.9), the quantity on the right of (3.7) provides an upper bound 
on the number of outer iterations i for which A* > e, whereas the quantity on the 
right of (3.13) gives, assuming e > 1/2, an upper bound on the number of inner 
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iterations for each of these outer iterations. However, for the first outer iteration 
satisfying A* < e, the initial iterate (= ^(Vi, o)) itself achieves the desired 
accuracy < e. Thus, the total number of inner iterations made before the 

algorithm is successful is at most the product of the two quantities, which is seen 
not to exceed the iteration bound in the statement of the theorem. 


It remains to consider the case e < 1/2. 

For any outer iteration i for which A* <3/4 (and for any 0 < e < 1), let 

' M Dist A 2 


•/, := 


1 


-A? 


- 1 


We claim that either 
Ji "Zl Ji 


or mm 


c • e — z 


(3.14) 


Consequently, if Ji > Ji, the algorithm will achieve the goal of computing a point 
y satisfying c < e within Ji inner iterations during outer iteration i. 

To establish (3.14), assume Ji < Ji and yet the inequality on the right of (3.14) 
does not hold. (We obtain a contradiction.) For every j < J i} Proposition 2.5 then 
implies 


A* - A itj > 


c • e — Zi 
c - e- z* 


= (1-A*)e (by (3.9)), 


and hence, using (3.12), 

distjj 4 (yij+i) 2 < dist- (| - A*) ( e/Mf , 
from which inductively follows 

dist(y^y +1 ) 2 < dist(j/i ;0 ) 2 - ( X + 1) (| - A*) (e/M) 2 

< Dist 2 - (Ji + 1)(|- A*) (e/M) 2 

< 0 , 


a contradiction. The claim is established. 


Assume e < 1/2, the case remaining to be considered. 

As each outer iteration i satisfying A* > 1/2 has only finitely many inner itera¬ 
tions, there must be at least one outer iteration i satisfying A* < 1/2. Let i be the 
first outer iteration for which A* < 1/2. From (3.8) and (3.13), the total number 
of inner iterations made before reaching outer iteration i is at most 


8(MDist) 2 

e 



(3.15) 


According to (3.14), during outer iteration i, the algorithm either achieves its 
goal within Ji inner iterations, or the algorithm makes no more than Ji inner 
iterations before starting a new outer iteration. Assume the latter case. Then, 
for outer iteration i + 1, the algorithm either achieves its goal within Ji + 1 inner 
iterations, or the algorithm makes no more than J,; + -| inner iterations before starting 
a new outer iteration. Assume the latter case. In iteration i + 2, the algorithm 
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definitely achieves its goal within Jj _|_2 inner iterations, because there cannot be a 
subsequent outer iteration due, by (3.10), to 

1(1 - A* +2 ) > (|) 3 (1 - At) > (|) 3 | > 1 - 

The total number of inner iterations made before the algorithm achieves its goal 
is thus bounded by the sum of the quantity (3.15) and 

Ji + Ji+1 + Ji +2 

(l 1 1 \ / MDist\ 2 

(using | - A* = (1 - A*) - i) 
o / MDist\ 2 

< 8 (“H ’ 

completing the proof of the theorem. □ 


4. General Convex Optimization 


We close with a general example meant to illustrate the flexibility of the preced¬ 
ing development, an example which also serves to highlight a few key differences 
between our approach and much of the literature on subgradient methods. 

Let / : £ —> (— 00 , 00 ] be an extended-valued and lower-semicontinuous convex 
function. Consider an optimization problem 


min f(x) 
s.t. x £ Feas , 


(4.1) 


where Feas = {x £ S : Ax = b} and S' is a closed, convex set with nonempty 
interior. Assume /* - the optimal value - is finite. 

We explore the complexity ramifications of solving (4.1) by converting it to 
an equivalent conic optimization problem and then applying Algorithm 2 of the 
preceding section (consideration of Algorithm 1 is exactly similar). 

Assume x is a known point lying in the interiors of both S and eff_dom(/), the 
effective domain of / (i.e., where / is finite). For convenience of exposition, assume 
x is not optimal for (4.1). 

Assume £ is endowed with an inner product ( , ), and assume || ||, the associated 
norm, satisfies 

{x : ||x — x|| < 1 and Ax = b} C Feas fl eff_dom(f) . (4.2) 

Let / be a known scalar satisfying / > f(x) for all x in the set on the left of (4.2). 
(The value / is required to be known because it together with x will determine the 
distinguished direction e.) 

Let D denote the diameter of the sublevel set {x £ Feas : /( x) < f{x)}. Assume 
D is finite, implying the optimal value f* of (4.1) to be attained by some feasible 
point. 

For later reference, observe that assumption (4.2) and the convexity of / imply 

m<o^r + TgrJ, 
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which in turn implies 


1 


i _ /(£!-/* 
/-/* 


< D + l. 


(4.3) 


As S is closed and convex, there exists a closed, convex cone K-i C £ x K for 
which S = {x : (x, 1) G IC±}. 


For an extended-valued function to be lower semicontinuous is equivalent to its 
epigraph being closed. Since the epigraph for / is convex, there thus exists a closed, 
convex cone ICj C f x i x I for which 


epi(/) := {(x,t) : /( x) <t} = {(x,f) : (x,l,i) G /C 2 } . 

Note 

t > f(x) => (x, l,t) G int(/C 2 ) , (4.4) 

a consequence of the assumption x G int(eff_clom(f)). 

Let 

K := {(x, s, t) : (x, s) G K,\ and (x, s, t) G /C 2 } . 

Clearly, the optimization problem (4.1) is equivalent to 

mince, s ,u t 

s.t. Ax = b ,. 

» = i (t5) 

(x, s, £) G 1C , 

and has the same optimal value, f*. The conic program (4.5) is of the same form 
as CP, the focus of preceding sections. 

Observe for all scalars t, 

Level t = {(x, 1, t) : x G Feas and /(x) < t} . (4.6) 

To apply Algorithm 2, a distinguished direction e is needed, and a computational 
inner product should be specified. Additionally, an input x is required. 

Let e = (x, 1,/), which lies in the interior of /C, due to (4.4) and x G int(S'). 
This distinguished direction, along with the cone /C, determine the map (x,s,t) <—> 
A m in(a ; I s, t) on £ x R x M. 

Choose the computational inner product on f x K x R to be any inner product 
that assigns to pairs (xi, 0, 0), (x 2 ,0, 0) the value (xi, x 2 ) (the original inner product 
on £.). By (4.2) and (4.6), the level set containing e then satisfies 

Level^. n B(e, 1) C /C , 

and hence, by Proposition 3.1, the Lipschitz constant is at most 1 for the map 
(x, s, t) i X A m i n (x, s, t) restricted to Affine^, for every t. 

Choose the input x to Algorithm 2 as x = (x, l,/(x)), which clearly is feasible 
for the conic program (4.5). Note that (4.6) then implies the horizontal diameter 
of the relevant sublevel set for the conic program satisfies 

Diam /(5) = D 

(i.e., is equal to the diameter of the sublevel set {x G Feas : f(x) < f(x)}). 

Applying Algorithm 2 results in a sequence of iterates (xfc, 1 ,tk) for which the 
projections (xj(, 1 ,t'k) '■= 7i(xfc, 1, t k ) satisfy x' k G Feas and f(x' k ) < t' k (simply 
because (x k , 1 ,t k ) is feasible for the conic program (4.5)). 
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Since (xq, 1, to) = (x, 1, f{x)), we have (x' 0 , 1, fg) = (xo, 1, to) - in particular, we 
have t' 0 = f(x). Consequently, the sequence of points x' k not only lie in Feas, but 
by Theorem 3.8 satisfy 


£ > 8D 2 ^ — + - log 4 / 3 ((D + 1) (1 — e)) + 1 ^ 

/K) - /* 


mm 

k<i 


< e 


(4.7) 


(4.8) 


f-f* 

where for (4.7) we have used (4.3), and for (4.8) have used /( x' k ) < t’ k . 

Deserving of emphasis is that the only projections made are onto the subspace 
{(x, s, t) : Ax = 0 and s = 0} (the same subspace at every iteration, a situation for 
which preprocessing is effective, especially if the number of equations is relatively 
small). This differs from much of the subgradient method literature where, for 
example, commonly required is projection onto Feas for each iterate landing outside 
Feas. (A projection swamps the cost of an iteration except for especially simple 
sets Feas, such as a box, a ball, or an affine space.) 


Another difference, deserving perhaps of even more emphasis, is that the bound 
(4.7) is independent of a Lipschitz constant for /. In fact, no Lipschitz constant is 
implied by the assumptions, as is seen by considering the family of univariate cases 
in which Feas = R, A = b = 0, and / is allowed to be any lower-semicontinuous 
convex function with 

[0, 2] C eff_dom(/) C [0, oo) , 

and which is strictly increasing at 0. The assumptions are fulfilled by choosing 
x = 1 and / = /(2), and by using the standard inner product (i.e., multiplication), 
in which case D = 1. The bound (4.7) is then £ > 8(^ + \ log 3 / 2 (2(l — e)) + 1), 

whereas the error bound (4.8) is ^f^ 2 )-f(o) — e - Clearly, even for the restriction of 
/ to [0, 2], nothing is implied about the Lipschitz constant other than trivial lower 
bounds such as L > |(/(2) — /(0)) . 

For the approach presented herein, Lipschitz continuity matters only with re¬ 
gards to the function ( x,s,t) A m i n (a ’,s,t) restricted to Affinet, which is guar¬ 
anteed to have Lipschitz constant at most 1 (due to assumption (4.2)). On the 
other hand, the error bound (4.8) is measured relatively, whereas in traditional 
subgradient-method literature relying on a Lipschitz constant for the objective 
function /, error is specified absolutely. 
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